9 research outputs found

    Separable Hash Functions

    Get PDF
    We introduce a class of hash functions with the property that messages with the same hash are well separated in terms of their Hamming distance. We provide an example of such a function that uses cyclic codes and an elliptic curve group over a finite field. \smallskip A related problem is ensuring that the {\it consecutive distance} between messages with the same hash is as large as possible. We derive bounds on the c.d. separability factor of such hash functions

    Extracting Features from Textual Data in Class Imbalance Problems

    Full text link
    [EN] We address class imbalance problems. These are classification problems where the target variable is binary, and one class dominates over the other. A central objective in these problems is to identify features that yield models with high precision/recall values, the standard yardsticks for assessing such models. Our features are extracted from the textual data inherent in such problems. We use n-gram frequencies as features and introduce a discrepancy score that measures the efficacy of an n-gram in highlighting the minority class. The frequency counts of n-grams with the highest discrepancy scores are used as features to construct models with the desired metrics. According to the best practices followed by the services industry, many customer support tickets will get audited and tagged as contract-compliant whereas some will be tagged as over-delivered . Based on in-field data, we use a random forest classifier and perform a randomized grid search over the model hyperparameters. The model scoring is performed using an scoring function. Our objective is to minimize the follow-up costs by optimizing the recall score while maintaining a base-level precision score. The final optimized model achieves an acceptable recall score while staying above the target precision. We validate our feature selection method by comparing our model with one constructed using frequency counts of n-grams chosen randomly. We propose extensions of our feature extraction method to general classification (binary and multi-class) and regression problems. The discrepancy score is one measure of dissimilarity of distributions and other (more general) measures that we formulate could potentially yield more effective models.Aravamuthan, S.; Jogalekar, P.; Lee, J. (2022). Extracting Features from Textual Data in Class Imbalance Problems. Journal of Computer-Assisted Linguistic Research. 6:42-58. https://doi.org/10.4995/jclr.2022.182004258

    The Average Transmission Overhead for Broadcast Encryption

    No full text
    We consider broadcast encryption schemes wherein a center needs to broadcast a secret message to a privileged set of receivers. We prescribe a probability distribution on the privileged set

    Covering Codes for Hats-on-a-line

    No full text
    We consider a popular game puzzle, called Hats-on-a-line, wherein a warden has n prisoners, each one wearing a randomly assigned black or white hat, stand in a line. Thus each prisoner can see the colors of all hats before him, but not his or of those behind him. Everyone can hear the answer called out by each prisoner. Based on this information and without any further communication, each prisoner has to call out his hat color starting from the back of the line. If he gets it right, he is released from the prison, otherwise he remains incarcerated forever. The goal of the team is to devise a strategy that maximizes the number of correct answers. A variation of this problem asks for the solution for an arbitrary number of colors. In this paper, we study the standard Hats-on-a-line problem and its natural extensions. We demonstrate an optimal strategy when the seeing radius and/or the hearing radius are limited. We show for certain orderings that arise from a (simulated) game between the warden and prisoners, how this problem relates to the theory of covering codes. Our investigations lead to two optimization problems related to covering codes in which one leads to an exact solution (for binary codes). For instance, we show that for 0 <k<n,(n − k − d) ≤ αmn where d = t(n − k, m k,m) is the minimum covering radius of an m-ary code of length (n − k) andsizem k and αm = log m log(m 2 − m +1). ∗ Both ATC and TRDDC are research units of Tata Consultancy Services Limited. the electronic journal of combinatorics 13 (2006), #R21 1

    A Parallelization of ECDSA Resistant to Simple Power Analysis Attacks

    No full text
    The Elliptic Curve Digital Signature Algorithm admits a natural parallelization wherein the point multiplication step can be split in two parts and executed in parallel. Further parallelism is achieved by executing a portion of the multiprecision arithmetic operations in parallel with point multiplication. This results in a saving in timing as well as gate count when the two paths are implemented in hardware and software. This article attempts to exploit this parallelism in a typical system context in which a microprocessor is always present though a hardware accelerator is being designed for performance. We discuss some implementation aspects of this design with reference to power analysis attacks. We show how the Montgomery point multiplication and the binary extended gcd algorithms can be adapted to prevent simple power analysis attacks
    corecore